Enriching the sequence substitution matrix by structural information.
نویسندگان
چکیده
A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence-to-sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence-alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well.
منابع مشابه
A 3 D - 1 D Substitution Matrix for Protein
In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...
متن کاملA 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.
In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...
متن کاملA Protein Structural Alphabet and Its Substitution Matrix CLESUM
By using a mixture model for the density distribution of the three pseudobond angles formed by Cα atoms of four consecutive residues, the local structural states are discretized as 17 conformational letters of a protein structural alphabet. This coarse-graining procedure converts a 3D structure to a 1D code sequence. A substitution matrix between these letters is constructed based on the struct...
متن کاملHydropathy Conformational Letter and its Substitution Matrix HP-CLESUM: an Application to Protein Structural Alignment
Motivation: Protein sequence world is discrete as 20 amino acids (AA) while its structure world is continuous, though can be discretized into structural alphabets (SA). In order to reveal the relationship between sequence and structure, it is interesting to consider both AA and SA in a joint space. However, such space has too many parameters, so the reduction of AA is necessary to bring down th...
متن کاملStructural alignment from sequence alone by integration of predicted secondary structure
Protein sequence alignment is at the core of a variety of fundamental tasks such as homology modeling, fold recognition, evolutionary studies and more. Much attention was dedicated over recent years for alignment of sequences in the twilight zone, where pairwise sequence alignment methods (e.g. BLAST) perform poorly. State of the art methods employ sequence profiles, and sometimes meta-sequence...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proteins
دوره 54 1 شماره
صفحات -
تاریخ انتشار 2004